Model Selection

Low VRAM inference

# Low VRAM inference

Bielik 4.5B V3.0 Instruct FP8 Dynamic

This model is the FP8 quantized version of Bielik-4.5B-v3.0-Instruct, utilizing AutoFP8 technology to quantize weights and activations into FP8 data type, reducing approximately 50% of disk space and GPU memory requirements.

Large Language Model Other

Bielik 1.5B V3.0 Instruct FP8 Dynamic

This is an FP8 dynamic quantization version based on the Bielik-1.5B-v3.0-Instruct model, adapted for vLLM or SGLang inference frameworks. It uses AutoFP8 quantization technology to reduce parameter bytes from 16-bit to 8-bit, significantly lowering disk space and GPU VRAM requirements.

Large Language Model Other

Gemma 3 27b It Qat GGUF

Gemma-3-27B is a quantized-optimized conversational large language model supporting advanced non-linear quantization techniques, delivering high-quality text generation capabilities.

Large Language Model

Qwq 32B Bnb 4bit

4-bit quantized version of QwQ-32B, optimized using Bitsandbytes technology, suitable for efficient inference in resource-constrained environments

Large Language Model

Cogvideox1.5 5B

CogVideoX is an open-source video generation model similar to Qingying, supporting high-resolution video generation

Text-to-Video English

Dorna Llama3 8B Instruct Quantized4Bit

4-bit quantized version of Dorna-Llama3-8B-Instruct, optimized for Persian language with Flash Attention 2 technology for enhanced inference efficiency

Large Language Model

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase